Recognition and normalization of disease mentions in PubMed abstracts

نویسندگان

  • Jitendra Jonnagaddala
  • Nai-Wen Chang
  • Toni Rose Jue
  • Hong-Jie Dai
چکیده

The rapidly increasing number of available PubMed documents calls the need for an automatic approach in the identification and normalization of disease mentions in order to increase the precision and effectivity of information retrieval. We herein describe our team’s participation for the Disease Named Entity Recognition and Normalization subtask under the chemical-disease relations track of the BioCreative V shared task. We developed a CRF-based model using BIESO tagging format to allow automated recognition of disease entities in PubMed abstracts. Recognized disease entities were normalized to MeSH concepts using a dictionary look-up method based on Lucene. Performance is reported using precision, recall and F-measure on three separate runs. Our best run achieved F-measure of 80.74% on disease mention recognition and 67.85 % on disease normalization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disease Named Entity Recognition and Normalization using Conditional Random Fields and Levenshtein Distance

This presents a machine learning-based approach for disease named entity recognition and normalization (DNER) subtask of Chemical Disease Relation (CDR) task in BioCreative V. This approach employs a Conditional Random Fields (CRF) based model with domain specific features in biomedical area in disease named entity recognition. In order to improve the performance of entity normalization, the me...

متن کامل

NCBI disease corpus: A resource for disease name recognition and concept normalization

Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical...

متن کامل

DNorm: disease name normalization with pairwise learning to rank

MOTIVATION Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text-the task of disease name normalization (DNorm)-compared with other normalization tasks in biomedical text mining research. METHODS In this article we introduce the first machine learning approach for DNorm, using the NCBI...

متن کامل

Human Gene Name Normalization using Text Matching with Automatically Extracted Synonym Dictionaries

The identification of genes in biomedical text typically consists of two stages: identifying gene mentions and normalization of gene names. We have created an automated process that takes the output of named entity recognition (NER) systems designed to identify genes and normalizes them to standard referents. The system identifies human gene synonyms from online databases to generate an extensi...

متن کامل

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015